An Analysis of Temporal-Difference Learning with Function Approximation

نویسندگان

John N. Tsitsiklis

Benjamin Van Roy

چکیده

We discuss the temporal-difference learning algorithm, as applied to approximating the cost-to-go function of an infinite-horizon discounted Markov chain. The algorithm we analyze updates parameters of a linear function approximator online during a single endless trajectory of an irreducible aperiodic Markov chain with a finite or infinite state space. We present a proof of convergence (with probability one), a characterization of the limit of convergence, and a bound on the resulting approximation error. Furthermore, our analysis is based on a new line of reasoning that provides new intuition about the dynamics of temporal-difference learning. In addition to proving new and stronger positive results than those previously available, we identify the significance of online updating and potential hazards associated with the use of nonlinear function approximators. First, we prove that divergence may occur when updates are not based on trajectories of the Markov chain. This fact reconciles positive and negative results that have been discussed in the literature, regarding the soundness of temporal-difference learning. Second, we present an example illustrating the possibility of divergence when temporaldifference learning is used in the presence of a nonlinear function approximator.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Verification of an Evolutionary-based Wavelet Neural Network Model for Nonlinear Function Approximation

Nonlinear function approximation is one of the most important tasks in system analysis and identification. Several models have been presented to achieve an accurate approximation on nonlinear mathematics functions. However, the majority of the models are specific to certain problems and systems. In this paper, an evolutionary-based wavelet neural network model is proposed for structure definiti...

متن کامل

Analysis of Temporal-Diffference Learning with Function Approximation

We discuss the temporal-difference learning algorithm, as applied to approximating the cost-to-go function of an infinite-horizon discounted Markov chain, using a function approximator involving linear combinations of fixed basis functions. The algorithm we analyze performs on-line updating of a parameter vector during a single endless trajectory of an ergodic Markov chain with a finite or infi...

متن کامل

On Convergence of Emphatic Temporal-Difference Learning

We consider emphatic temporal-difference learning algorithms for policy evaluation in discounted Markov decision processes with finite spaces. Such algorithms were recently proposed by Sutton, Mahmood, and White (2015) as an improved solution to the problem of divergence of off-policy temporal-difference learning with linear function approximation. We present in this paper the first convergence...

متن کامل

Experimental analysis of eligibility traces strategies in temporal difference learning

Temporal difference (TD) learning is a model-free reinforcement learning technique, which adopts an infinite horizon discount model and uses an incremental learning technique for dynamic programming. The state value function is updated in terms of sample episodes. Utilising eligibility traces is a key mechanism in enhancing the rate of convergence. TD(λ) represents the use of eligibility traces...

متن کامل

Efficient Asymptotic Approximation in Temporal Difference Learning

in Temporal Difference Learning Frédérick Garcia and Florent Serre Abstract. TD( ) is an algorithm that learns the value function associated to a policy in a Markov Decision Process (MDP). We propose in this paper an asymptotic approximation of online TD( ) with accumulating eligibility trace, called ATD( ). We then use the Ordinary Differential Equation (ODE) method to analyse ATD( ) and to op...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1998

An Analysis of Temporal-Difference Learning with Function Approximation

نویسندگان

چکیده

منابع مشابه

Verification of an Evolutionary-based Wavelet Neural Network Model for Nonlinear Function Approximation

Analysis of Temporal-Diffference Learning with Function Approximation

On Convergence of Emphatic Temporal-Difference Learning

Experimental analysis of eligibility traces strategies in temporal difference learning

Efficient Asymptotic Approximation in Temporal Difference Learning

عنوان ژورنال:

اشتراک گذاری